backup: fix AdminUnsplit race in online restore download cleanup#169758
backup: fix AdminUnsplit race in online restore download cleanup#169758msbutler wants to merge 2 commits intocockroachdb:masterfrom
Conversation
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
78d3dc8 to
34cec46
Compare
| besteffort.Warning(ctx, "or-unsplit", func(_ context.Context) error { | ||
| return errors.Wrapf(err, "failed to unsplit %s", sp) | ||
| }) |
There was a problem hiding this comment.
This seems like a bit of a weird usage of besteffort.Warning. I think it'd be a bit more idiomatic to do:
besteffort.Warning(ctx, "or-unsplit", func(ctx context.Context) error {
if err := execCfg.DB.AdminUnsplit(ctx, sp.Key); err != nil && !kv.pb.IsKeyNotStartOfRange(err) {
return err
}
return nil
})| // IsKeyNotStartOfRange returns true if the error indicates that an AdminUnsplit | ||
| // request targeted a key that is not the start of a range. It falls back to | ||
| // string matching for compatibility with older nodes that produce the untyped | ||
| // error. |
There was a problem hiding this comment.
Should we leave a TODO here to signal that the string matching can be removed in 27.1?
Introduce a typed sentinel error for the "key is not the start of a range" condition returned by AdminUnsplit. This replaces brittle strings.Contains checks with errors.Is, while including a string fallback in the IsKeyNotStartOfRange helper for mixed-version compatibility. Migrate the GC job's unsplitRangesInSpan to use the new helper. Epic: none Fixes: cockroachdb#169637 Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
A race between the GC job and the online restore download job could cause AdminUnsplit to fail with "key is not the start of a range" when the GC job merges ranges that the download job later tries to unsplit. This caused retry spiraling and eventual job failure. Make unstickRestoreSpans swallow "not start of range" errors (the range already merged, so the sticky bit is gone) and use besteffort.Warning for other errors, since unsplitting is cleanup that should not block the restore. Epic: none Fixes: cockroachdb#169637 Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
34cec46 to
7594580
Compare
|
Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link) |
|
going to try a different approach |
kvpb: add ErrKeyNotStartOfRange sentinel for AdminUnsplit
Introduce a typed sentinel error for the "key is not the start of a
range" condition returned by AdminUnsplit. This replaces brittle
strings.Contains checks with errors.Is, while including a string
fallback in the IsKeyNotStartOfRange helper for mixed-version
compatibility.
Migrate the GC job's unsplitRangesInSpan to use the new helper.
Epic: none
Fixes: #169637
Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com
backup: make unstickRestoreSpans best-effort
A race between the GC job and the online restore download job could
cause AdminUnsplit to fail with "key is not the start of a range" when
the GC job merges ranges that the download job later tries to unsplit.
This caused retry spiraling and eventual job failure.
Make unstickRestoreSpans swallow "not start of range" errors (the
range already merged, so the sticky bit is gone) and use
besteffort.Warning for other errors, since unsplitting is cleanup
that should not block the restore.
Epic: none
Fixes: #169637